Korean Erroneous Sentence Classification With Integrated Eojeol Embedding

نویسندگان

چکیده

This paper attempts to analyze the Korean sentence classification system. Sentence is task of classifying an input based on predefined categories. However, spelling or space error contained in causes problems morphological analysis and tokenization. proposes a novel approach Integrated Eojeol (Korean syntactic word separated by space) Embedding reduce effect poorly analyzed morphemes classification. The also two noise insertion methods that further improve performance. Our evaluation results indicate applying proposed existing classifiers, accuracy erroneous sentences increased 8% 15%.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Experiments with Sentence Classification

We present a set of experiments involving sentence classification, addressing issues of representation and feature selection, and we compare our findings with similar results from work on the more general text classification task. The domain of our investigation is an email-based help-desk corpus. Our investigations compare the use of various popular classification algorithms with various popul...

متن کامل

Integrated sentence generation with charts

Integrating surface realization and the generation of referring expressions (REs) into a single algorithm can improve the quality of the generated sentences. Existing algorithms for doing this, such as SPUD and CRISP, are search-based and can be slow or incomplete. We offer a chart-based algorithm for integrated sentence generation which supports efficient search through chart pruning.

متن کامل

Numeric-attribute-powered Sentence Embedding

Modern embedding methods focus only on the words in the text. The word or sentence embeddings are trained to represent the semantic meaning of the raw texts. However, many quantified attributes associated with the text, such as numeric attributes associated with Yelp review text, are ignored in the vector representation learning process. Those quantified numeric attributes can provide important...

متن کامل

A New Document Embedding Method for News Classification

Abstract- Text classification is one of the main tasks of natural language processing (NLP). In this task, documents are classified into pre-defined categories. There is lots of news spreading on the web. A text classifier can categorize news automatically and this facilitates and accelerates access to the news. The first step in text classification is to represent documents in a suitable way t...

متن کامل

Enhancing Sentence Relation Modeling with Auxiliary Character-level Embedding

Neural network based approaches for sentence relation modeling automatically generate hidden matching features from raw sentence pairs. However, the quality of matching feature representation may not be satisfied due to complex semantic relations such as entailment or contradiction. To address this challenge, we propose a new deep neural network architecture that jointly leverage pre-trained wo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Access

سال: 2021

ISSN: ['2169-3536']

DOI: https://doi.org/10.1109/access.2021.3085864